by CM
Posted on April 19, 2020
In this article, we will leverage the TensorFlow Object Detection API for detecting live traffic. In particular, we will use TensorFlow 2 and OpenCV to do live inference on a video stream. The goal is to use the ssd_mobilenet_v1_coco_2017_11_17 model to detect and ultimately count cars. As mentioned in the Object Detection Basic article, please make sure to have installed the following dependencies: TensorFlow 2.x, OpenCV, and the TF Object Detection API. In case you have not used / installed the dependencies before -- there is a nice tutorial by sentdex how to install both TF and OpenCV.
Having an image or a video stream, an object detection model should be able to identify a set of objects as well as their position within an image. In other words, the object detection model that we will build in the article will be trained to detect the presence and location of multiple classes of objects. The idea is that we will only focus on the ability to detect car of the model. In other words, we will ignore all other detection classes.
Lets first, check our TensorFlow Version. We plan to build a Object Detection Model with TensorFlow 2.x. Remember the original object detection API by Google was designed for TF1.x and is incompatible with TF2.x. In our case, we are working with version '2.1.0'
import tensorflow as tf
tf.__version__
Let's jump right into the Code. First, we import all required dependencies. (1) pathlib offers classes representing filesystem paths with semantics appropriate for different operating systems. (2) importlib has two purposes. One is to provide the implementation of the import statement (and thus, by extension, the __import__() function) in Python source code. Two, the components to implement import are exposed in this package, making it easier for users to create their own custom objects (known generically as an importer) to participate in the import process. (3) numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. (4) OpenCV is a library of programming functions mainly aimed at real-time computer vision. (5) TF object detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection.
import pathlib
import importlib
import numpy as np
#OpenCV
import cv2
#TensorFlow Object Detection API
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
Second, as we are using TensorFlow 2.x, we need to patch / rename two TensorFlow files of the classical object detection API.
# Rename tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1
# Rename tf.io.gfile into tf.gfile
tf.gfile = tf.io.gfil
We then will define four functions: (1) Loading the model (2) Reducing the object detection to the Car Class (3) Running inference on picture material (4) Initializing the model.
We will start of with building the loading model function. We therefore will make use of the pretrained model provided by the TensorFlow API. In addition, we will specify the path to our prediction labels that we will later use to map the different classes in the image / video stream.
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/'
model_file = model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=base_url + model_file,
untar=True)
model_dir = pathlib.Path(model_dir)/"saved_model"
model = tf.saved_model.load(str(model_dir))
model = model.signatures['serving_default']
return model
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
The second function will be the inference function on a single image. Therefore, we will provide two arguments: The Model that we will run inference on as well as the input file. As we are working with TensorFlow, we will need to covert the images to input tensors. Our idea is to provide the image a batch of images to run inference on. Further, we will need to define our output dictionary. Remember all outputs are batches tensors. Hence, we need to convert them to numpy arrays, and take index [0] to remove the batch dimension. Note that detection_classes should be ints. Now it is time to handle the model masks in the frame of the image. Therefore, we will check frame a detection mask in the respective shape around the image.
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis, ...]
# Run inference
output_dict = model(input_tensor)
# All outputs are batches tensors.
# Convert to numpy arrays, and take index [0] to remove the batch dimension.
# We're only interested in the first num_detections.
num_detections = int(output_dict.pop('num_detections'))
output_dict = {key: value[0, :num_detections].numpy()
for key, value in output_dict.items()}
output_dict['num_detections'] = num_detections
# detection_classes should be ints.
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
# Handle models with masks:
if 'detection_masks' in output_dict:
# Reframe the the bbox mask to the image size.
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
output_dict['detection_masks'], output_dict['detection_boxes'],
image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
tf.uint8)
output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
classes2 = output_dict['detection_classes'].astype(np.int64)
scores2 = output_dict['detection_scores']
return output_dict
!-------->
The third function will be the reduction of classes of the TensorFlow API. Remember, we only want to detect cars with our model. Hence, we only gonna return the results of the output_dict of a single class that we select by setting the class_id.
def reduce_to_one_class(output_dict, class_id):
indices = [i for i, x in enumerate(output_dict['detection_classes']) if x == class_id]
return {'detection_classes': output_dict['detection_classes'][indices],
'detection_boxes': output_dict['detection_boxes'][indices],
'detection_scores': output_dict['detection_scores'][indices],
'num_detections': len(indices)}
Now we will define our run_infernece function with using the webcam stream as an input.
def run_inference(model):
# activate video capture option
#cv2 = getpack("opencv-python", "cv2")
cap = cv2.VideoCapture(0)
total_count = 0
total_passed_vehicle = 0
count_current_frame = 0
count_before_frame = 0
width_heigh_taken = True
height = 0
width = 0
i = 0
count = 0
count_before_frame_wait_1 = 0
count_before_frame_wait_2 = 0
while True:
(ret, image_np) = cap.read()
if not ret:
print("end of the video file...")
break
input_frame = image_np
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
output_dict = reduce_to_one_class(output_dict, class_id=3)
final_score = output_dict['detection_scores']
count_current_frame = 0
final_score = np.squeeze(final_score)
print('final_score: ',final_score)
print('final_score.size: ',(final_score.size))
if (final_score.size) > 0:
if (final_score.size) > 1:
#Iteerate of Scores -- in case more than one car has been 'detected' as a class.
for i in output_dict['detection_scores']:
print('i: ',i)
#print("output_dict['detection_scores'][i]", output_dict['detection_scores'][i])
if i > 0.5:
count_current_frame = count_current_frame + 1
elif i > 0.5:
count_current_frame = 1
if count_before_frame < count_current_frame:
total_count = total_count + count_current_frame
print("New car(s)")
elif count_before_frame == count_current_frame:
print("Same car")
elif count_before_frame > count_current_frame:
print("Less car(s) / No Car")
if count_before_frame_wait_1 > count_current_frame:
count_before_frame = count_before_frame - count_current_frame
count_before_frame_wait_1 = count_before_frame - count_current_frame
count_before_frame = count_current_frame
print('count ', count_current_frame)
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=4)
cv2.imshow('object counting',input_frame)
#cv2.imshow('Object detection', cv2.resize(image_np,(1280,720)))
if cv2.waitKey(25) & 0xFF ==ord('q'):
cv2.destroyAllWindows()
break
Lastly, we will initialize our model.
def main():
model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
detection_model = load_model(model_name)
run_inference(detection_model)
if __name__=="__main__":
main()
Give it a second to open up the Webcam window. After the window opens up -- you should be able to do live inference. The Object Detection should now be able to detect and count cars.
In this simple tutorial, we have used TensorFlow Object Detection API to do live inference with a Webcam on a particular object including respective counting.